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1 Introduction 

Sequential change-point detection (or quickest change detection, or quickest "disorder" de- 
tection) is concerned with the design and analysis of techniques for quickest (on-line) de- 
tection of a change in the state of a phenomenon, subject to a tolerable limit on the risk of 
a false detection. Specifically, the substrate of the phenomenon is a time process that may 
unexpectedly undergo an abrupt change-of-state from "normal" to "abnormal", each defined 
as deemed appropriate given the physical context at hand. Inference about the current state 
of the process is drawn by virtue of (quantitative) observations (e.g., measurements). The 
sequential setting assumes the observations are made successively, and, so long as the be- 
havior thereof suggests the process is in the normal state, it is let to continue. However, if 
the state is believed to have altered, one's aim is to detect the change "as soon as possible", 
so that an appropriate response can be provided in a timely manner. Thus, with the arrival of 
every new observation one is faced with the question of whether to let the process continue, 
or to stop it and raise an alarm (and, e.g., investigate). The decision has to be made in real 
time based on the available data. The time instance at which the process' state changes is 
referred to as the change-point, and the challenge is that it is not known in advance. 

Historically, the subject of change-point detection first began to emerge in the 1920- 
1930's motivated by considerations of quality control. Shewhart's charts were popular in 
the past (see Shewhart 1931). Efficient (optimal and quasi-optimal) sequential detection pro- 
cedures were developed much later in the 1950-1960's, after the emergence of Sequential 
Analysis, a branch of statistics ushered by Wald (1947). The ideas set in motion by Shewhart 
and Wald have formed a platform for a vast literature on both theory and practice of sequen- 
tial change-point detection. See, e.g., Girschick and Rubin (1952), Page (1954), Shiryaev 
(1961, 1963, 1978), Roberts (1966), Siegmund (1985), Tartakovsky (1991), Brodsky and 
Darkhovsky (1993), Basseville and Nikiforov (1993), Poor and Hadjiliadis (2008). 

The desire to detect the change quickly causes one to be trigger-happy, which, on one 
hand, will lead to an unacceptably high level of the risk of sounding & false alarm - ter- 
minating the process prematurely as a result of an erroneous decision that the change did 
occur, while, in fact, it never did. On the other hand, attempting to avoid false alarms too 
strenuously will cause a long delay between the actual time of occurrence of the change (i.e., 
the true change-point) and the time it is detected. Hence, the essence of the problem is to 
attain a tradeoff between two contradicting performance measures - the loss associated with 
the delay in detection of a true change and that associated with raising a false alarm. A good 
sequential detection policy is expected to minimize the average loss related to the detection 
delay, subject to a constraint on the loss associated with false alarms (or vice versa). 

Putting this idea on a rigorous mathematical basis requires formal definition of both the 
"detection delay" and the "risk of raising a false alarm". To this end, contemporary theory of 
sequential change-point detection distinguishes four different approaches: the minimax ap- 
proach, the Bayesian approach, the generalized Bayesian approach, and the approach related 
to multi-cyclic detection of a distant change in a stationary regime. Alone, each has its own 
history and area(s) of application. This notwithstanding the four approaches are connected 
and fit together into one big picture shown in Figure 1 . 

The aim of this paper is to give a brief expose of the above four approaches to quickest 
change detection. Specifically, the plan is to assess the progress made to date within each 
with the emphasis on the novel exact and asymptotic optimality results. 
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Bayesian Approach 
(cliaiige-poiiit - random) 




I 



Minimax Approach 
(change-point - unknown) 



Fig. 1 Four approaches to sequential quickest change-point detection. 

2 Change-point models 

To formally state the general quickest change-point detection problem, we first have to in- 
troduce a change-point model as well as a model for the observations. To this end, a myriad 
of scenarios is possible; see, e.g., Fuh (2003, 2004), Lai (1995, 1998), Shiryaev (1961, 1963, 
1978, 2009, 2010), Tartakovsky (1991, 2009a), Tartakovsky and Moustakides (2010), Tar- 
takovsky and Veeravalli (2005). This section is intended to review the major ones. 

A change-point model is characterized by the probabilistic structure of the monitored 
process (independent, identically or non-identically distributed, correlated, etc.) as well as 
by that of the change-point (unknown deterministic, random completely or partially depen- 
dent on the observed data, random fully independent from the observations). 

Consider a probability triple (i7, T ^ P), where T = Mn'^oTn, Tn is the sigma-algebra 
generated by the first n ^ 1 observations (J^o = {0, ^2} is the trivial sigma-algebra), and 
P : T ^ [0, 1] is a probability measure. Let Pcxj and Pq be two mutually locally absolutely 
continuous (i.e., equivalent) probability measures; for a general case permitting singular 
measures to be present, see Shiryaev (2009). For d = {0, oo}, write P^^"^ = Pd|j^„ for the 
restriction of measure P^i to the sigma-algebra Tn, and let p^X\-) be the density of P^"^ 
(with respect to a dominating sigma-finite measure). 

Let {Xn}n^i denote the series of (random) observations; the series is defined on the 
probability space (J7, J^, P), and distribution-wise is such that for some time index v, the ob- 
servations Xi,X2, ■ ■ ■ ,X,y adhere to measure Poo ("normal" regime), butX^+i, Xi,-(-2, • • • 
follow measure Pq ("abnormal" regime). That is, at an unknown time instant ly (change- 
point), the observations undergo a change-of-regime from normal to abnormal. Note that i/ 
is the serial number of the last normal observation, so that if ^ = 0, then the entire series 
{Xn}n^i is in the abnormal regime admitting measure Pq, while ifu = oo, then {X„}„^i 
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is in the normal regime admitting measure Pea (i.e., there is no change) . Another practice 
popular in the literature is to define i/ as the serial number of the first post-change obser- 
vation. Although the two definitions map into one another, throughout the remainder of the 
paper we will follow the former convention. 

For every fixed v ^ 0, the change-of-regime in the series {X„}„^i gives rise to a new 
probability measure P^. We will now demonstrate how to construct the pdf p[/"''(Xi ) of 

(n) 

F]j for n ^ 1 and v ^ Oin the most general case. For the sake of brevity, we will omit the 
superscript and will write p^{Xi) in the following. 

For I i ^ j, let X' = {Xi, X^+i, . . . , Xj), that is, Xj is a sample of j— succes- 
sive observations indexed from i through j. Hence, if the sample X" = (Xi, X2, . . . , X„) 
is observed, then Xj = (Xi, . . . , Xfc) is the vector of the first k observations in this sample 
and XJ?_|_]^ = (Xfc+i, . . . , Xn) is the vector of the rest of the observations in the sample 
from fc -|- 1 to n. 

First, suppose u is deterministic unknown. This is the main assumption of the minimax 
approach; recall Figure 1. To get density piy(Xi ), observe that by the Bayes rule 

Poo(X?) =p^{Xi) X p^{X"+i\Xi) and po(X?) = poiXi) x po(X"+i|X'i'), 

whence by combining the first factor of the pre-change density, poo(Xi ), with the second 
one of the post-change density, po(X"), we obtain pi,(Xi) = pcx)(Xi ) xpo(X"+i[Xi ), 
or, after some more algebra using the Bayes rule. 



p.(xr)= j^np(i)(x,|xr')j X n pi'\x,\xi-')j , (1) 

where p^i^ {Xj |Xj~^) and pg''''(Xj jXj"^) are the conditional densities of the j-th obser- 
vation, Xj, given the past information Xj~^, j > 1. Note that in general these densities 

depend on j. Hereafter it is understood that nj=fc-i-i P^P ^-^i = 1 for fc > n. 

Model (1) is very general. It does not assume either independence or homogeneity 
of observations — the observations may be arbitrary dependent and nonidentically dis- 
tributed. Furthermore, in certain state-space models and hidden Markov models due to the 
propagation of the change-point the post-change conditional densities pQ''^\Xj\X]~^), 
V + \ ^ j n depend on the change-point v; see, e.g., Tartakovsky (2009a). Model (1) 
includes practically all possible scenarios. If, for example, there is a switch of one non-iid 
model to another non-iid model, which are mutually independent, then the two segments, 
pre- and post-change, of the observed process are independent, and in (1) the post-change 
conditional densities pp"'^ (Xj |X{~^), j ^ u + 1 are replaced withpQ"'^(Xj|Xi). 

Suppose now that the observations { X„ } „ ^ 1 are independent and such that X'l , . . . , X^ 
are each distributed according to a common density f{x), while Xi/+i,X^+2, ■ ■ ■ each 
follow a common density g{x) ^ f{x). This is the simplest and most prevalent case. For 
convenience, from now on it will be refemd to as the iid case, or the iid model. It can be 
seen that in this case, model (1) reduces to 



p.(X?) = W /(X,) X n 9{X,) , (2) 



and it will be referenced repeatedly throughout the paper. 
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If the change-point, v, is random, which is the ground assumption of the Bayesian ap- 
proach (see Figure 1), then the model has to be supplied with the change-point's prior distri- 
bution. There may be several change-point mechanisms and, as a result, a random variable v 
may be dependent on the observations or independent from the observations. To account for 
these possibilities at once, let ttq = ^ 0) and iTn = V(y = njX" ), n > 1, and observe 
that the series {7r„}„^o is {-Fn} -adapted. That is, the probability of the change occurring 
at time instance v = k depends on X\, the observations' history accumulated up to (and 
including) time moment A; ^ 1. With the so defined prior distribution one can describe very 
general change-point models, including those that assume is a {J^ri}-adapted stopping 
time; see Moustakides (2008). 

To conclude this section, we note that when the probability series {7r„}n^o depends 
on the observed data {Xn}n^i, it is argumentative whether {iTn^n^Q can be referred to as 
the change-point's prior distribution: it can just as well be viewed as the change-point's a 
posteriori distribution. However, a deeper discussion of this subject is out of scope to this 
paper, and from now on, we will assume that {7r„}„^o do not depend on {Xn}n^i, in 
which case it represents the "true" prior distribution. 

3 Overview of optimality criteria 

Contemporary theory of change-point detection is an ensemble of the Bayesian approach, 
the generalized Bayesian approach, the minimax approach, and the approach related to 
multi-cyclic detection of a disorder in a stationary regime; see Figure 1. The object of this 
section is to briefly discuss each problem setting. 

A sequential detection procedure is a stopping time T adapted to the filtration { J-Vi}n^o 
induced by the observations {Xn)n^i, i.e., the event {T ^ n} € Tn for every n ^ 0. 
Therefore, after observing Xi , . . . , Xt it is declared that the change is in effect. That may 
or may not be the case. If it is not, then T ^ u, and it is said that a false alarm has been 
sounded. Also, note that since J^o is the trivial sigma-algebra, any {J>t}-adapted stopping 
time T is either strictly positive with probability (w.p.) 1, or T = w.p. 1. The latter case is 
clearly degenerate, and to preclude it, from now on we shall assume T > w.p. 1. 

Common to the Bayesian, generalized Bayesian, and minimax approaches is that the 
detection procedure is applied only once; the result is either a false alarm, or a correct (may 
be delayed) detection. Irrespectively, what takes place beyond the stopping point T is of 
no concern. We will refer to this as the single-run paradigm, which is shown in Figure 2. 
Figure 2(a) shows an example of the behavior of a certain process of interest as exhibited 
through the sequence of observations {Xn}n^i - It can be seen that the process undergoes a 
shift in the mean at some time instant u, the change-point. Figure 2(b) (red trajectory) gives 
an example of the corresponding detection statistic trajectory that exceeds the detection 
threshold prematurely, i.e., before the change occurs. This is a false alarm situation, and T 
can be regarded as the (random) run length to the false alarm. Another possibility is shown 
in Figure 2(b) (blue trajectory). This is an example where the detection statistic exceeds 
the detection threshold past the change-point. Note that the detection delay, captured by the 
difference T — random. 

In a variety of surveillance applications the detection procedure should be applied re- 
peatedly. This requires specification of a renewal mechanism after each alarm (false or true). 
The simplest renewal strategy is to restart from scratch, in which case the procedure be- 
comes multi-cyclic with similar cycles (in a statistical sense) if the process is homogeneous. 
In the following sections, we will consider such an approach related to detection of a distant 



6 



Polimcheiiko and Tartakovsky 







Normal (Pre 


Change} Regime 




^ Y ^ 


ta, X, 










a 










Abnormal (Post-Change) Regime 


a 










Chaiige-Point 


O 




Surveillance Starts 

/ 






(possibly random) 






Time, n 



(a) An example of the behavior of a phenomenon (process) of interest as exhibited through the 
sequence of observations {Xn}n^i. 



Detection Delay, T — u, where; T > u 
(random) 




Time, n 



(b) Two possible scenarios of the corresponding detection process: false alarm (red trajectory) and 
correct detection (blue trajectory). 

Fig. 2 Single-run sequential change-point detection. 



change in a stationary regime, assuming that the detection procedure is applied repeatedly 
starting anew after each time the detection statistic exceeds the threshold. 



3.1 Bayesian formulation 

The signature feature of the Bayesian formulation is the assumption that the change-point is 
a random variable possessing a prior distribution. This is instrumental in certain applications 
(see Shiryaev 2006, 2010, or Tartakovsky and Veeravalli 2005), but mostly of interest since 
the limiting versions of Bayesian solutions lead to useful procedures, which are optimal or 
asymptotically optimal in more practical minimax problems. 

Let {TTfcjfc^o be the prior distribution of the change-point, v, where ttq = ¥{u ^ 0) and 
7i"fc = = k) for k ^ I. From the Bayesian point of view, the risk of sounding a false 
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alarm is reasonable to measure by the Probability of False Alarm (PFA), which is defined as 

oo 

PFA"(T) = P"(T s£ ^) = ^ 7rfcPfc(T sC fe), (3) 

oo 

where P'^(^) = Yl '^k'PkiA) and the tt in the superscript emphasizes the dependence 

fc=0 

on the prior distribution. Note that summation in (3) is over fc ^ 1 since by convention 
^kiT S5 1) = 1, so that Pfc(T SC 0) = 0. The most popular and practically reasonable way 
to benchmark the detection delay is through the Average Detection Delay (ADD), which is 
defined as 

ADD''(T) = E''[T - iy\T > i^] = E''[(T - i^)+]/P''(T > i^), (4) 

where hereafter = max{0, x} and E'^ denotes expectation with respect to P'^. 

We are now in a position to formally introduce the notion of Bayesian optimality. Let 
Aa = {T: PFA'^(T) SC a} be the class of detection procedures (stopping times) for 
which the PFA does not exceed a preset (desired) level a € (0, 1). Then under the Bayesian 
approach one's aim is to 

find Topt e such that ADD''(Topt) = inf ADD''(T) for every a G (0, 1). (5) 

For the iid model (2) and under the assumption that the change-point u has a geometric 
prior distribution this problem was solved by Shiryaev (1961, 1963, 1978). Specifically, 
Shiryaev assumed that u is distributed according to the zero-modified geometric distribution 

< 0) = TT and P(i/ = n) = (1 - 7r)p(l - p)", n > 0, (6) 

where n 6 [0, 1) and p G (0, 1). This is equivalent to choosing the series {7r„}„^o as 
TTo = P(j^ < 0) = TT -h (1 - 7r)p and 7r„ = r{v = n) = (1 - 7r)p(l - p)", n > 1. 

Observe now that if a ^ 1 — tt, then problem (5) can be solved by simply stopping right 
away. This clearly is a trivial solution, since for this strategy the ADD is exactly zero, and 
PFA''(T) = P(i/ > 0) = 1 - TT, so that the constraint PFA''(T) ^ a is satisfied. There- 
fore, to avoid trivialities we have to assume that a < 1 — tt. In this case, Shiryaev (1961, 
1963, 1978) proved that the optimal detection procedure is based on testing the posterior 
probability of the change currently being in effect, P(i/ < n| against a certain detection 
threshold. The procedure stops as soon as P(i/ < n\J^n) exceed the threshold. This strategy 
is known as the Shiryaev procedure. To guarantee its strict optimality the detection threshold 
should be set so as to guarantee that the PFA is exactly equal to the selected level a, which 
is rarely possible. 

The Shiryaev procedure will play an important role in the sequel when considering non- 
Bayes criteria as well. It is more convenient to express Shiryaev's procedure through the 
average likelihood ratio (LR) statistic 

where A„ = g{Xn) / f{Xn) is the "instantaneous" LR for the n-th data point, Xn- Indeed, 
by using the Bayes rule, one can show that 
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whence it is readily seen that "thresholding" the posterior probability < n\J^„) is the 
same as "thresholding" the process {i?7i,p}n>i- Therefore, the Shiryaev detection procedure 
has the form 

Ts{A) = inf{n > 1 : R„,p > A}, (9) 

and if A = Aa can be selected in such a way that the PFA is exactly equal to a, i.e., 
PFA'"(Ts(Aq)) = a, then it is strictly optimal in the class A{a), that is, 

inf ADD'^fT) = ADD''(Ts(A„)) for any < a < 1 - tt. 

TeA{a) 

Note that Shiryaev's statistic Rn,p can be rewritten in the recursive form 

Rn,p = {1 + Rn-l,p)-^, with Ro,p = (10) 

1 — P (1 — TTjp 

We also note that (7) and (8) remain true under the geometric prior distribution (6) 
even in the general non-iid case (1), with A„ = However, 
in order for the recursion (10) to hold in this case, {An}n^i should be independent of the 
change-point. 

As p — > 0, where p is the parameter of the geometric prior (6), the Shiryaev detection 
statistic (10) converges to what is known as the Shiryaev-Roberts (SR) detection statistic. 
The latter is the basis for the so-called SR procedure. As we will see, the SR procedure is a 
"bridge" between all four different approaches to change-point detection mentioned above. 

For a general asymptotic Bayesian change-point detection theory in discrete time that 
covers practically arbitrary non-iid models and prior distributions, see Tartakovsky and Veer- 
avalli (2005). Specifically, this work addresses the Bayesian approach assuming only that the 
prior distribution is independent of the observations. The overall conclusion made by the au- 
thors is two-fold: a) the Shiryaev procedure is asymptotically (as a — > 0) optimal in a very 
broad class of change-point models and prior distributions, and b) depending on the behavior 
of the prior distribution at the right tail, the SR procedure may or may not be asymptotically 
optimal. Specifically, if the tail is exponential, the SR procedure is not asymptotically op- 
timal, though it is asymptotically optimal if the tail is heavy. When the prior distribution 
is arbitrary and depends on the observations, we are not aware of any strict or asymptotic 
optimality results. 

3.2 Generalized Bayesian formulation 

The generalized Bayesian approach is the limiting case of the Bayesian formulation, pre- 
sented in the preceding section. Specifically, in the generalized Bayesian approach the change- 
point u is assumed to be a "generalized" random variable with a uniform (improper) prior 
distribution. 

First, return to the Bayesian constrained minimization problem (5). Specifically, con- 
sider the iid model (2) and assume that the change-point v is distributed according to zero- 
modified geometric distribution (6). Then the Shiryaev procedure defined in (10) and (9) is 
optimal if the threshold A = Aa is chosen so that PFA'^(Ts(yiQ)) = a. Suppose now that 
TT = and p — > 0; this is turning the geometric prior (6) to an improper uniform distri- 
bution. It can be seen that in this case {Rn,p}n^o becomes {Rn,o}n^Oy where i?o,o = 
and Rn^o = (1 + ii„-i,o) A„, n > 1 with A„ = g{Xn)/ f{Xn). The limit {Rn,o}n^o is 
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known as the SR statistic, and is customarily denoted as {Rn}n^o, i-c, Rn = Rn,o for ail 
n > 0; in particular, note that Rq = 0. 

Next, when tt = and p — > it can also be shown that 

^(^>^) ^E.[T]and^H(^^^f;K.[(T-fc)+], (11) 



P P 

where T is an arbitrary stopping time. As a result, one may conjecture that the SR procedure 
minimizes the Relative Integral Average Detection Delay (RIADD) 

, , Er nIEfc[(T- fc)+] 

RIADD(T) = j^^l^j ^ (12) 

over all detection procedures for which the Average Run Length (ARL) to false alarm, 
Eoo[T], is no less than 7 > 1, an a priori set level. 
Let 

Z\(7) = {T: Eoo[T] >7}, (13) 

be the class of detection procedures (stopping times) for which the ARL to false alarm 
Eoo[T] is "no worse" than 7 > 1. Then under the generalized Bayesian formulation one's 
goal is to 

find Topt G AM such that RIADD(Topt) = inf RIADD(T) for every 7 > 1. 

TeA(-y) 

(14) 

We have already hinted that this problem is solved by the SR procedure. This was for- 
mally demonstrated by Pollak and Tartakovsky (2009b) in the discrete-time iid case, and 
by Shiryaev (1963) and Feinberg and Shiryaev (2006) in continuous time for detecting a 
shift in the mean of a Brownian motion. 

We conclude this subsection with two remarks. First, observe that if the assumption 
TT = is replaced with tt = rp, where r ^ is a fixed number, then, as p — > 0, the 
Shiryaev statistic {Rn,p}n^o converges to {Rn}n^o^ where R^ = (1 + Rn-i) ^n, n ^ I 
with Rq = r ^ 0. This is the so-called Shiryaev-Roberts-r (SR-r) detection statistic, and 
it is the basis for the SR-r detection procedure that starts from an arbitrary deterministic 
point r. This procedure is due to Moustakides et al (2011). The SR-r procedure possesses 
certain minimax properties (cf. Polunchenko and Tartakovsky (2010) and Tartakovsky and 
Polunchenko (2010)). We will discuss this procedure at greater length later. 

Secondly, though the generalized Bayesian formulation is the limiting (as p — > 0) case 
of the Bayesian approach, it may also be equivalently re-interpreted as a completely differ- 
ent approach - multi-cyclic disorder detection in a stationary regime. We will address this 
approach in Subsection 3.4. 



3.3 Minimax formulation 

Contrary to the Bayesian formulation the minimax approach posits that the change-point is 
an unknown not necessarily random number. Even if it is random its distribution is unknown. 
The minimax approach has multiple optimality criteria. 

First minimax theory is due to Lorden (1971) who proposed to measure the risk of 
raising a false alarm by the ARL to false alarm Eoo [T] (recall the false alarm scenario 
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from Figure 2(b)). As far as the risk associated with detection delay is concerned, Lorden 
suggested to use the "worst-worst-case" ADD defined as 

Jt(T)= sup |esssupE,[(T-i/) + |J-,]|. (15) 

Lorden's minimax optimization problem seeks to 

find Topt e A{-f) such that Ji^{Topt) = inf Jl{T) for every 7 > 1, (16) 

where A('y) is the class of detection procedures with the lower bound 7 on the ARL to false 
alarm defined in (13). 

For the iid scenario (2), Lorden (1971) showed that Page's (1954) Cumulative Sum 
(CUSUM) procedure is first-order asymptotically minimax as 7 — ^ 00. For any 7 > 1, 
this problem was solved by Moustakides (1986), who showed that CUSUM is exactly opti- 
mal (see also Ritov (1990) who reestablished Moustakides' (1986) finding using a different 
decision- theoretic argument). 

Though the strict JL(r)-optimality of the CUSUM procedure is a strong result, it is 
more natural to construct a procedure that minimizes the average (conditional) detection 
delay, Ei. [T — i/jT > , for all > simultaneously. As no such uniformly optimal proce- 
dure is possible, PoUak (1985) suggested to revise Lorden's version of minimax optimality 
by replacing J7l {T) with 

Jp(T) = sup E^[T- i/|T > j^], (17) 

the worst conditional expected detection delay. Thus, PoUak's version of the minimax opti- 
mization problem seeks to 

find Topt e A{j) such that Jp(Topt) = inf Jp(T) for every 7 > 1. (18) 

It is our opinion that Jp(T) is better suited for practical purposes for two reasons. 
First, Lorden's criterion is effectively a double-minimax approach, and therefore, is overly 
pessimistic in the sense that j7p(T) ^ Jl(T). Second, it is directly connected to the con- 
ventional decision theoretic approach — the optimization problem (18) can be solved by 
finding the least favorable prior distribution. More specifically, since by the general decision 
theory, the minimax solution corresponds to the (generalized) Bayesian solution with the 
least favorable prior distribution, it can be shown that sup^ ADD'^(T) = j7p(T), where 
ADD'^(T) is defined in (4). In addition, unlike Lorden's minimax problem (16), PoUak's 
minimax problem (18) is still not solved. For these reasons, from now on, when consider- 
ing the minimax approach, we focus on PoUak's supremum ADD measure Jp(T). Some 
light as to the possible solution (in the iid case) is shed in the work of Polunchenko and 
Tartakovsky (2010), Tartakovsky and Polunchenko (2010), and Moustakides et al (2011). A 
synopsis of the results is given in Sections 7 and 8. 

We conclude this section with presenting yet another way to gauge the risk of raising 
a false alarm, namely, by means of the worst local (conditional) probability of sounding a 
false alarm within a time "window" of a given length. As argued by Tartakovsky (2005, 
2008), in many surveillance applications (e.g., target detection) this probability is a better 
option than the ARL to false alarm, which is more global. Specifically, let 

Z\^ = iT: supPoc(fe < T ^ fc + m|T > fc) al, (19) 
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be the class of detection procedures for which Poo (k < T ^ k + m\T > k), the conditional 
probability of raising a false alarm inside a sliding window of m ^ 1 observations is "no 
worse" than a certain a priori chosen level a € (0, 1). The size of the window m may either 
be fixed or go to infinity when a — > 0. 

Let T be the stopping time associated with a generic detection procedure. The appropri- 
ateness of the ARL to false alarm Ecx^ [T] as an exhaustive measure of the risk of raising a 
false alarm is questionable, unless the Poo -distribution of T is geometric, at least approxi- 
mately; see Tartakovsky (2005, 2008). The geometric distribution is characterized entirely 
by a single parameter, which a) uniquely determines Eoo \T], and b) is uniquely determined 
by Eoo \T]. As a result, if T is geometric, one can evaluate Poo(fc < T ^ k -\- m\T > k) 
for any fc ^ (in fact, for all fc ^ at once). 

For the iid model (2), PoUak and Tartakovsky (2009a) showed that under mild assump- 
tions the Poo -distribution of the stopping times associated with detection schemes from a 
certain class is asymptotically (as 7 — > cxd) exponential with parameter 1 / Eoo \T] ; the con- 
vergence is in the U' sense, where p J$ 1. See also Tartakovsky et al (2008). The class 
includes all of the most popular procedures. Hence, for the iid model (2), the ARL to false 
alarm is an acceptable measure of the false alarm rate. However, for a general non-iid model 
this is not necessarily true, which suggests that alternative measures of the false alarm rate 
are in order. 

As argued by Tartakovsky (2005), in general, sup;. Poo (A; < T ^ fc + m\T > k) ^ a 
is a stronger condition than Eoo[T] > 7. Hence, in general, C ^(7). See also Tar- 
takovsky (2009b). In Section 8 we take the work of Polunchenko and Tartakovsky (2010) 
and Tartakovsky and Polunchenko (2010) one step further and present a procedure that 
solves the optimization problem (18) in the class (19) in a specific example. 

3.4 Multi-cyclic detection of a disorder in a stationary regime 

Common to all of the above approaches is that the detection procedure is applied only once. 
This is the single-run paradigm. The result is either a correct (though usually delayed) detec- 
tion or a false alert; recall Figure 2. Yet another formulation may be derived by abandoning 
the single-run paradigm for the multi-run or the multi-cyclic one. 

Specifically, consider a context in which it is of utmost importance to detect the change 
as quickly as possible, even at the expense of raising many false alarms (using a repeated 
application of the same stopping rule) before the change occurs. This is equivalent to saying 
that the change-point v is substantially larger than the tolerable level of false alarms 7. 
That is, the change "strikes" in a distant future and is preceded by a stationary flow of false 
alarms. This scenario is schematically shown in Figure 3. As one can see, the ARL to false 
alarm in this case is the mean time between (consecutive) false alarms, and therefore may 
be thought of the false alarm rate (or frequency). 

As argued by PoUak and Tartakovsky (2009b), the multi-cyclic approach is instrumental 
in many surveillance applications, in particular in the areas concerned with intrusion/anomaly 
detection, e.g., cybersecurity and particularly detection of attacks in computer networks. 

Formally, let Ti , T2 , . . . denote sequential independent repetitions of the same stopping 
time T, and let 7(j) = T(i) -|- T(2) -!-••• + T^j) be the time of the j-th alarm. Define 
lu = min{j ^ 1: T(j) > v}. Put otherwise, 7(7„) is the time of detection of the true 
change that occurs at the time instant v after 1^ — 1 false alarms have been raised. Write 



Jst{T) = lim E,[7^,„) -i.] 



(20) 



o 



Normal (Pre-Change) Regime 

A 



V 

Abnormal (Post-Change) Regime 



<[[[^^^tart of Surveillance~~^^]]]> 

(a) An example of the behavior of a process of interest as exhibited through the series of observations {Xn}n^i . 



Run Length to False Alann, T(-2j 
(random) 



Detection Delay, 7^/^^) — i^, where 7(7^) > 
(random) 





Time, 



1 



^^^^^^^^ange-Poinr^^^ 



Time, n 



Run Length to False Alarm, T^^^j 
(random) 



(b) An example of the behavior of the detection statistic when the decision to terminate surveillance is made past the change-point. 
Fig. 3 Multi-cyclic change-point detection in a stationary regime. 
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for the limiting value of the ADD that we will refer to as the stationary ADD (STADD). 
We are now in a position to formalize the notion of optimality in the multi-cyclic setup: 

find Topt eA{j) such that Jst {Topt ) = inf Jst (T) for every 7 > 1 (21) 

(among all multi-cyclic procedures). 

For the iid model (2), this problem was solved by Pollak and Tartakovsky (2009b), who 
showed that the solution is the multi-cyclic SR procedure by arguing that JJst{T) is the 
same as RIADD(T) defined in (12). This suggests that the optimal solution of the problem 
of multi-cyclic change-point detection in a stationary regime is completely equivalent to the 
solution of the generalized Bayesian problem. The exact result is stated in the next section. 



4 Optimality properties of the Shiryaev-Roberts detection procedure 

From now on we will confine ourselves to the iid scenario (2), i.e., we assume that a) the 
observations {Xn}n^i are independent throughout their history, and b) Xi, . . . ,Xi, are 
distributed according to a common known pdf f{x) and X^+i, Xv+2, ... are distributed 
according to a common pdf g{x) ^ f{x), also known. 

Let Hk : V = k fox k < 00 and %oo : v = oohe,, respectively, the hypotheses that 
the change takes place at the time moment = fc, fe > 0, and that no change ever occurs. 
The densities of the sample X" = (Xi , . . . , X„), n > 1 under these hypotheses are given 
by 

n 

p[X^\'Hoo) = Y{f{X,), 

A: n 

p(X?|?^fc) = J]/(X,) W g{X,) forfe<n, 

]=1 j=k+l 

andp(X"|?^oo) = piXilUk) for fc ^ n, so that the corresponding LR is 

where A„ = g{Xn) / f {Xn) is the "instantaneous" LR for the n-th observation Xn. 

To decide in favor of one of the hypotheses Hk or "Hoc, the likelihood ratios are then 
"fed" to an appropriate sequential detection procedure, which is chosen according to the 
particular version of the optimization problem. In this section we are interested in the gen- 
eralized Bayesian problem stated in (14) and in the multi-cyclic disorder detection in a 
stationary regime stated in (21). We have already remarked that for the iid model in ques- 
tion the SR procedure solves both these problems. We preface the presentation of the exact 
results with the introduction of the SR procedure. 
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4. 1 The Shiryaev-Roberts procedure 

The SR procedure is due to the independent work of Shiryaev (1961, 1963) and Roberts 
(1966). Specifically, Shiryaev considered the problem of detecting a change in the drift of 
a Brownian motion; Roberts focused on the case of detecting a shift in the mean of an iid 
Gaussian sequence. The name Shiryaev-Roberts was given by PoUak (1985), and it has 
become the convention. 

Formally, the SR procedure is defined as the stopping time 

Sa = inf{n > 1 : i?„ > A}, (22) 

where A > is the detection threshold, and 

i?„ = (l + i?„-i)A„, 71^1 with Rq=0 (23) 

is the SR detection statistic. As usual, we set inf{0} = oo, i.e., Sa = cx) if Rn never 
crosses A. 



4.2 Optimality properties 

Recall first that Rn = limp_>o Rn,p, where Rn,p is the Shiryaev statistic given by recur- 
sion (10). Recall also that the limiting relations (11) hold. These facts allow us to conjecture 
that the SR procedure is optimal in the generalized Bayesian sense. In addition, as we stated 
in Subsection 3.4, the RIADD is equal to the STADD of the multi-cyclic procedure, so that 
we expect that the repeated SR procedure is optimal for detecting distant changes. The exact 
result is due to Pollak and Tartakovsky (2009b) and is given next. 

Theorem 1 (Pollak and Tartakovsky 2009b) Let Sa be the SR procedure defined by 
(22) and (23). Suppose the detection threshold A = A^ is selected from the equation 
^oc[Sa^] = 7, where 7 > 1 w the desired level of the ARL to false alarm. 

(i) Then the SR procedure Sa., minimizes RIADD(T) = X^^q Efc \{T — A;)'*']/ Eoo \T\ 
over all stopping times T that satisfy Eoo \T] ^ 7, that is, 

RIADD(5aJ = ^ inf ^RIADD(T) for every 7 > 1. 

(ii) For any stopping time T, RIADD(T) = J^st{T). Therefore, the SR procedure Sa., 
minimizes the stationary average detection delay among all multi-cyclic procedures in 
the class ^(7), i-e., 

Jst{SaJ = inf Jst{T) for every 7 > 1. 

Tezi(7) 

It is worth noting that the ARL to false alarm of the SR procedure satisfies the inequality 
Eoo [Sa] ^ A for all ^4 > 0, which can be easily obtained by noticing that Rn — n is a 
Pcx) -martingale with mean zero. Also, asymptotically (as A 00), Eoo [Sa] ~ ^/C' where 
the constant < C < 1 is given by (32) below (see Pollak 1987). Hence, setting A^ = j( 
yields Eoc[Sa.,] » 7, as 7 ^> 00. 
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5 Optimal and nearly optimal minimax detection procedures 

In this section, we will be concerned exclusively with the minimax problem in PoUak's 
setting (18), assuming that the change-point v is deterministic unknown. As of today, this 
problem is not solved in general. As has been indicated earlier, the usual way around this is to 
consider it asymptotically by allowing the ARL to false alarm 7 — > oo. The hope is to design 
such procedure T* € ^(7) that Jp(T*) and the (unknown) optimum infxgzi(7) J^?{T) 
will be in some sense "close" to each other in the limit, as 7 — >^ 00. To this end, the following 
three different types of asymptotic optimality are usually distinguished. 

Definition 1 (First-Order Asymptotic Optimality) A procedure T* e ^(7) is said to be 
first-order asymptotically optimal in the class A(pj) if 

^ — — 1 + 0(1), as 7 — > 00, 



infT64(7)^p(T) 
where o(l) — > as 7 — > cxd. 

Definition 2 (Second-Order Asymptotic Optimality) A procedure T* G ^(7) is said to 
be second-order asymptotically optimal in the class A(j) if 

Jp(T*)- inf Jp(T) =0(1), as 7^00, 

T6/i(7) 

where 0(1) stays bounded as 7 —> cxd. 

Definition 3 (Third-Order Asymptotic Optimality) A procedure T* G ^(7) is said to 
be third-order asymptotically optimal in the class Z\(7) if 

Jp{T*)- inf Jp(T) = o(l), as 7^00. 

TgA(7) 



5.1 The Shiryaev-Roberts-Pollak procedure 

The question of what procedure minimizes PoUak's measure of detection delay Jp (T) is 
an open issue. As an attempt to resolve the issue, PoUak (1985) proposed to "tweak" the 
SR procedure (22). This led to the new procedure that we will refer to as the Shiryaev- 
Roberts-Pollak (SRP) procedure. To facilitate the presentation of the latter, we first explain 
the heuristics. 

As known from the general decision theory (see, e.g., Ferguson 1967, Theorem 2.1 1.3), 
an -adapted stopping time T solves (18) if a) T is an extended Bayes rule, b) it is an 
equalizer, and c) it satisfies the false alarm constraint with equality. A procedure is said 
to be an equalizer if its conditional risk (which we measure through Ev[T — v\T > i/]) 
is constant for all ^ 0, that is, Eo[T] = E„[T - iy\T > v] for all > 1. Of the three 
conditions the one that requires T to be an equalizer poses the most challenge. Pollak (1985) 
came up with an elegant solution. 

It turns out that the sequence £^[5^ — v\Sa > i^] indexed by v eventually stabi- 
lizes, i.e., it remains the same for all sufficiently large v; see Figure 4 below. This happens 
because the SR detection statistic enters the quasi-stationary mode, which means that the 
conditional distribution Poo(^?n ^ x\Sa > n) no longer changes with time. If one could 
get to the quasi-stationary mode immediately, then the resulting procedure would have the 
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same expected conditional detection delay for all v ^ 0, i.e., it would be the equalizer. 
Thus, PoUak's (1985) idea was to start the SR detection statistic {Rn}n^o, defined in (23), 
not from zero (Rq = 0), but from a random point Rq = Rq , where R^ is sampled from 
the quasi-stationary distribution of the SR statistic under the hypothesis T-Lao (which is a 
Markov Harris-recurrent process under 'Hoc)- Specifically, the quasi-stationary cdf QAix) 
is defined as 

Qa{x) = lim Voo{Rn ^x\SA>n). (24) 

n—^oo 

Therefore, the SRP procedure is defined as the stopping time 

5^ =inf{n> 1: i?^ > A}, (25) 
where A > is the detection threshold, and 

i?^ = (l + i?^_i)A„, n>l, R^^Qa{x) (26) 

is the detection statistic. 

We reiterate that, by design, the SRP procedure (25) and (26) is an equalizer: it delivers 
the same conditional average detection delay for any change-point i/, that is, Eo[iS^] = 
E^[5^ - iy\S'^ > v] for all v ^ I. 

PoUak (1985) was able to demonstrate that the SRP procedure is third-order asymptoti- 
cally optimal with respect to j7p(T). More specifically, the following is true. 

Theorem! (Pollak 1985) Le^Eo [(log Ai)+] < oo. Suppose that in the SRP procedure 
the detection threshold A = Aj is selected in such a way that Eoo ] = 7. Then 

Jp[S'l)= inf Jp(T)-ho(l), flj 7^cx). 

Recently, Tartakovsky et al (2011) obtained the following asymptotic approximation for 
J7p(5^) under the second moment condition Eo[log Ai]^ < 00: 

^o[S^\ = j{\ogA + x-Coo) + o{\), as A ^00, 

where x is the limiting average overshoot in the one-sided sequential test which is a subject 
of renewal theory (see, e.g., Woodroofe 1982) and Coo is a constant that can be computed 
numerically (e.g., by Monte Carlo simulations). Both x and Coo are formally defined in the 
next subsection, where we reiterate the exact result of Tartakovsky et al (201 1). 
Note that for sufficiently large 7, 

rA 

Eoo [5^] « {A/0 - tiQ, where pq = / ydQA^v), (27) 

i.e., /iQ is the mean of the quasi-stationary distribution, and is a constant defined in (32) 
below. This approximation can be obtained by first noticing that for a fixed Rq = r the 
process R^ — r — n is a zero-mean Poo -martingale, and then applying optional sampling 
theorem to this martingale as well as a renewal theoretic argument (cf. Tartakovsky et al 
2011). 
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5.2 The Shiryaev-Roberts-r procedure 

The third-order asymptotic optimality of the SRP procedure makes the latter practically 
appealing. On the flip size, the SRP rule requires the knowledge of the quasi-stationary dis- 
tribution (24). It is rare that this distribution can be expressed in a closed form; for examples 
where this is possible, see Pollak (1985), Mevorach and PoUak (1991), Polunchenko and 
Tartakovsky (2010) and Tartakovsky and Polunchenko (2010). As a result, the SRP proce- 
dure has not been used in practice. 

To make the SRP procedure implementable, Moustakides et al (2011) proposed a nu- 
merical framework. More importantly, Moustakides et al (201 1) offered numerical evidence 
that there exist procedures that are uniformly better than the SRP procedure. Specifically, 
they regard starting off the original SR procedure at a fixed (but specially designed) Ro = r, 
^ r < A, and defining the stopping time with this new deterministic initialization. Be- 
cause of the importance of the starting point, they dubbed their procedure the SR-r proce- 
dure. 

Formally, the SR-r procedure is defined as the stopping time 

Sa = inf{n > 1 : i?; ^ yl}, (28) 
where A > is the detection threshold, and 

K = (1 + K.-i)An, n > 1, with Ro = r^O (29) 

is the SR-r detection statistic. 

Moustakides et al (2011) show numerically that for certain values of the starting point, 
Rq = r, apparently, Ey[5^^ — v\Sa^ > v\ is strictly less than E^[iS^^ — v\S'j^^ > v] for 
all ^ 0, where Ai and A2 are such that Eoo[iSJI^J = Eoo['5^^] (although the maximal 
expected delay is only slightly smaller for TJ^^). 

It turns out that using the ideas of Moustakides et al (2011) we are able to design the 
initialization point r = r(j) in the SR-r procedure (28) so that this procedure is also 
third-order asymptotically optimal. In this respect, the average delay to detection at infinity 
ADDoo {Sa) = linii/-i.oo E,^ [iS^ — i/jiS^ > i/\ plays the critical role. To understand why, let 
us look at Figure 4 which shows the average delay to detection E^ [Sa — v\S\ > v\vs.v for 
several initialization values i?^ = This figure was obtained using the integral equations 
and the numerical technique of Moustakides et al (201 1). For r = 0, this is the classical SR 
procedure (with i?o = 0) whose average delay to detection is monotonically decreasing to 
its minimum (steady state value) that is attained at infinity. Note that in fact this steady state 
is attained for essentially finite values of the change-point v. It is seen that there exists a 
value r = rA that depends on the threshold A for which the worst point v is at infinity, i.e., 
Jp(5^"*) = ADDoo ('S^'*). The value of rA is the minimal value for which this happens 
and it is also the value that delivers the minimum to the difference between Jp {Sa ) and the 
lower bound for vaireA^-^) J^v{T) derived by Moustakides et al (2011) and by Polunchenko 
and Tartakovsky (2010). This is a very important observation, since it allows us to build a 
proof of asymptotic optimality based on an estimate of ADDoo (5^). We also note that 
for the SR-r procedure with initialization r = ta (pink line) the average detection delay 
at the beginning and at infinity are approximately equal, Eo[iS^*] » ADDoo(5^*). This 
allows us to conjecture that an "optimal" SR-r is an equalizer at the beginning {v = 0) and 
at sufficiently large values of v, so that initialization rA should be selected to achieve this 
property. The following theorem, whose proof can be found in Polunchenko and Tartakovsky 
(2010), shows that the lower bound for the "minimax risk" can be expressed via the integral 
average detection delay of the SR-r procedure, which partially explains the issue. 
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\ 

Fig. 4 Typical behavior of the conditional expected detection delay E^[5^ — flS^ > u] of the SR-r 
procedure as a function of the change-point v for various initialization strategies. 



Theorem 3 Let be defined as in (28) and (29), and let A = be selected so that 
IEoo[5^^] = 7- Then, fi)r ewery r ^ 0, 

^ inf^^^p(T) , ^"-^^LWj = '''' 

Note that Theorem 3 suggests that if r can be chosen so that the SR-r procedure is an 
equalizer (i.e., Eo[iS^] = E^[iS^ — flS^ > v\ for all v > 0), then it is exactly optimal. 
This is because the right-hand side in (30) is equal to Eo[iS^], which, in turn, is equal to 
sup^ [iS^ — v\S\ > v\= J7p (iS^). Therefore, we have the following corollary that will 
be used in Section 7 for proving that the SR-r procedure with a specially designed r = r a 
is strictly optimal for two specific models. 

Corollary 1 Let A = Aj be selected so that Eoo['S^^,] = 7- Assume that r = r{j) is 
chosen in such a way that the SR-r procedure S^'^'' is an equalizer Then it is strictly 
minimax in the class i.e., 

inf Jp(r) = Jp(5;(^^). (31) 

While the SR-r is not strictly minimax in general, it is almost obvious that this proce- 
dure is almost minimax. In fact, Moustakides et al (201 1) conjecture that the SR-r procedure 
is third-order asymptotically minimax and Tartakovsky et al (201 1) show that this conjecture 
is true. We will state the exact result after we introduce some additional notation. 
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Let Sn = log Ai + ■ ■ ■ + log A„ and, for a ^ 0, introduce the one-sided stopping time 
Ta = infjri ^ 1 : 5n ^ a}. Let Ka = St^ — a be an overshoot (excess over the level a at 
stopping), and let 

x= lim EoM, C= lim Eq [e"''"] . (32) 

The constants x > and < C < 1 depend on the model and can be computed 
numerically. Next, let / = Eo[logAi] denote the KuUback-Leibler information number, 
and let Vcx> = Y^'jLi . Also, let R^o be a random variable that has the Poo -limiting 
(stationary) distribution of as n — > oo, i.e., Qst{x) = lim„_j.oo Poo(J^n a;) = 
Pcx)(i?oo sS x).Let 

Coo =E[log(l-hi?oo -hKo)] = / / \og{l + x + y)dQsT{x)dQ{y), (33) 

Jo Jo 

where =Po(yoo y). 

Theorem4 (Tartakovskyetal 2011) LerEo [log Ai]^ < oo and let log Ai be non-arithmetic. 
Then the following assertions hold. 

(i) 7 — > OO, 

inf Jp{T);?-[\ogijO + x-Coo]+o{l). (34) 
Te/i(7) i 

(ii) For any r ^ 0, 

ADDoo(>S^) =Eo[5^] = y(logyl + x-Coo) + o(l), as A ^oo. (35) 

(iii) Furthermore, if in the SR—r procedure A = A^ = ^C, and the initialization point 
r = 0(7) is selected so that Jp(5^) = ADDoo(5^), then EooK] = 7(1 + o(l)) 
and 

JpiS'A) = y [log(7C) + x-Co.]+ 0(1) as 7 ^ 00. (36) 
Therefore, the SR—r procedure is third-order asymptotically optimal: 

JpiS'A)- inf Jp(T) =0(1), fli 7^00. 

Te4(7) 

Also, 

ADDo{SA) = j[\ogA + x-C{r)] + o{l), as A ^ 00, (37) 

where 

/"OO 

C7(r)=E[log(l + r + yoo)]= / \og{l + r + y) dQ{y). (38) 

Jo 

As we mentioned above, it is desirable to make the SR-r procedure to look like equalizer by 
choosing the head start r, which can be achieved by equalizing ADDo and ADDoo. Com- 
paring (35) and (37) we see that this property approximately holds when r is selected from 
the equation C(r*) = Coo- This shows that asymptotically as 7 — > cxd the "optimal" value 
r* is a fixed number that does not depend on 7. Clearly, this observation is important since 
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it allows us to design tlie initialization point effectively and make the resulting procedure 
approximately optimal. 

It is worth mentioning that for the conventional SR procedure that starts from zero 

Jp(5a) = ADDo(5a) = y[logA + x-C(0)] + o(l), as A -,00. 

Therefore, the SR procedure is only second-order asymptotically optimal. For suffi- 
ciently large 7, the difference between the supremum ADD-s of the SR procedure and the 
optimized SR-r is given by (C(0) — Coo)//, which can be quite large if the KuUback- 
Leibler information number / is small. 

Note that similar to (27), for sufficiently large 7, 

Ec^[5^] « (A/C)-r. (39) 



Polunchenko and Tartakovsky (2010) and Tartakovsky and Polunchenko (2010) offer 
two scenarios where the SR-r procedure is strictly minimax. Both are discussed (and ex- 
tended) in Section 7. In addition, in Section 8 we present an example where distributions 
Qst{x) and Q{x) and the constants x, Coo, and C(r) can be computed analytically. 



6 Numerical performance evaluation 



Recall that each of the four approaches adduced above is characterized by its own optimal- 
ity criterion. Together they bring about a variety of performance measures. Hence, to judge 
about the efficiency of a detection procedure (with respect to one performance measure or 
another), one has to be able to compute the procedure's corresponding operating character- 
istics (OC). In this section, we present integral equations for a multitude of OC-s that are 
of interest in various problem settings (Bayesian, minimax, etc.) and practical applications. 
Usually these equations cannot be solved analytically and numerical techniques are needed; 
cf. Moustakides et al (2011); Tartakovsky et al (2009). 

Consider a generic detection procedure described by the stopping time 

ri =inf{n> 1: (40) 

where yl > is the detection threshold and {K^}n>o is a Markov detection statistic that 
satisfies the recursive relation 



V: = A„, n ^ 1 with Vo' = s^ 0, (41) 

where ^{x) is a positive-valued function and s is a fixed parameter referred to as the start- 
ing point or the "head start". Observe first that this detection scheme constitutes a rather 
broad class of detection schemes that includes, e.g., CUSUM, Shiryaev's procedure, SR-r, 
and EWMA (exponentially weighted moving average). Indeed, for the Shiryaev procedure 
^{V) = (1 + V)/{1 - p) and for the SR-r procedure S.{V) = l + V. (We will not consider 
other procedures such as CUSUM and EWMA in this paper). 

Let Pdit) = Pd(Ai ^ t) denote the cdf of the LR Ai under the measure P^, d = 
{0, 00}. Define 

ic4x,y) = ■^Mv:+i ^ y\v:: = x) = (-^] , d = {o,oo}, 
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where {Vn}n^o is as in (41). Here and in the following we assume that Ai is continuous 
under both hypotheses. 

Let l{x) = E^[Ta] and 5o{x) = Eo[Ta], where Ta is as in (40). Observe that e{x) 
and So{x) are conditional expectations of the form Ed[ • \ Vq = x] (for d = oo and d = 
respectively), where Vq = a; is the starting point of the generic detection statistic (41). It is 
not difficult to see that £{x) and 5o{x) are governed by the equations 

e{x) = l+ r JCoo{x,y)£{y)dy, (42) 
Jo 

and 

fA 

(43) 



5o{x) = l+ JCo{x,y)6o{y)dy, 
■Jo 



respectively; cf. Moustakides et al (201 1). 

Next, for any > 0, let 6^{x) = E^[(7I - and p^{x) = Poo(7I > i^)- 
By Moustakides et al (201 1), 

5u+i{x)=l JC^{x,y)S^{y)dy, p^+i{x) = K.^{x,y) py{y) dy, (44) 
Jo Jo 

where 5q{x) is governed by (43) and po{x) = 1 for all x, since ¥oo{Ta > 0) = 1. 
Consider now the conditional average delays to detection [TX ~v\Ta > j^] = E^ [(Ta — 
v)'^]/^ATX >v),v^ 0. Since V^{TX > v) = V^{TX > v), we obtain 

. ^ _ E.[(ri-^)+] _ 5.{x) 

where 5u{x) and Pv{x) are given by (44). Thus, the conditional average detection delays 
can be computed for any v 0, which allows one to evaluate sup^>Q Ei, [TX — v\Ta > v\ ■ 
Now, let ^{x) = J27=o^AiTA - = Er=o ^A^)- By Theorem 1, 

Jst{Ta) - RIADD(r^) ^-^^ j^, 

so that in order to compute the STADD J7st(7a) we have to be able to compute ipi^)- As 
shown by Moustakides et al (201 1), tpi^) is determined by the equation 

tp{x) = 5o{x) + K.oc{x,y)il){y)dy, (45) 
Jo 

where 5q(x) is governed by equation (43). 

Note that the lower bound (30) for the minimax risk given in Theorem 3 can be computed 

as 

^^^Ta)= ^^^^^^ , 

where £{x), So{x), and ipi^) are governed by equations (42), (43), and (45). 

The local conditional probabilities of false alarm Poo (7^ ^ k + ?ti|7a > k), k ^ 
inside a fixed "window" of size m = 1, 2, . . . can also be evaluated noting that Pcio(Ta 
k + m\TA > k) = I - pk+m{x)/ Pk(.x), where pk{x) are as in (44). Having Poo(T| ^ 
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k + m\TA > k) evaluated for sufficiently many fe's, one can easily find supj. VooTX ^ 
k + m\TA > k) for any fixed m. 

The next step is to extend the obtained equations to the case when TX is randomized 
similarly to the SRP procedure (25) and (26). To this end, let Qa (y) = linirn-oo Poo [V^ ^ 
uITa > n) be the quasi-stationary distribution. Note that this distribution does not depend 
on the starting point Vq = s and exists whenever the LR is continuous; cf. (Harris 1963, 
Theorem III. 10.1). 

It can be shown that the quasi-stationary pdf qA{x) = cIQa {x)/dx satisfies the equation 

>^AqA{y)= / qA{x)ICoo{x,y)dx, subject to / qA{x)dx = 1, (46) 
Jo Jo 

whence one can conclude that qA{x) is the left dominant eigenvector of the linear integral 
operator induced by the kernel ICoo{x, y), and € (0, 1) is the comsponding eigenvalue; 
cf. Moustakides et al (2011) and Pollak (1985). We also note that both qA{x) and \a are 
unique. 

Consider now 7^ defined as the above generic procedure Ta with the starting point 
being random and sampled from the quasi-stationary distribution. Specifically, 

=ini{n^l:V^ ^ A}, (47) 

where yl > is the detection threshold, and {K?}n>o is a generic detection statistic com- 
puted recursively 

= CCK?-!) A„, n > 1 with ~ Qa- (48) 

We note that the SRP procedure is the special case of Ta with $_{x) = I + x. 

Once qA{x) and \a are available, one can compute the ARL to false alarm and the 
detection delay (which is independent from the change-point) for this randomized variant 

of the generic procedure Ta ■ Indeed, 

,-A rA 

Eoo[ri^]= / e{x) qA{x) dx = — and Eo[T^] = / So{x) qA{x) dx. 

Jo 1 - Jo 

To understand the second equality in the formula for E oo [Ta ] , note that Ta is Poo -geometrically 
distributed with the "probability of success" 1 — A^. We also remark that, by design, 
the randomized variant T^ of the generic procedure Ta is an equalizer, i.e., Eo[7^] = 
E.[7^ - J^IT^ > H for all u ;? 1. 

Finally, we present Bayesian operating characteristics — the average detection delay 



ADD {Ta) i_pFA-(7-^) 

and the probability of false alarm, PFA''(Ta) = Efcli TrkV^aiTA ^ k). Assuming the 
geometric prior distribution (6), we obtain 

PFA- (TI) = (1 - tt) 1 1 - p f; (1 - p) I , 

oo CXD 

TTfc Efc[(7l - fc)+] = n5o{r) + (1 - ^)p ^(1 - pf5k{r) 



k=0 k=0 
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(cf. Tartakovsky and Moustakides 2010). Let ipp{x) = '^'^^q{1 — p)^5k{x) and Xp{^) = 
X^^g(l — p)^ Pk{x). Using the Markov property of the statistic , it is readily seen that 
^p{x) and Xp{X') satisfy the following integral equations 



The above equations are Fredholm integral equations of the second kind. As a rule, such 
equations do not allow for an (exact) analytical solution. For a few exceptions from the 
rule see Pollak (1985), Mevorach and PoUak (1991), Polunchenko and Tartakovsky (2010), 
and Tartakovsky and Polunchenko (2010). The results of the last two papers are summarized 
and extended in Section 7. Hence, a numerical technique may be in order. A simple numer- 
ical interpolation-projection type scheme has been suggested by Moustakides et al (2011). 
The scheme is effectively a piecewise collocation method with interpolating polynomials 
being of degree zero (constants). Using, e.g., (Atkinson and Han 2009, Theorem 12.1.2) we 
can conclude that the corresponding rate of convergence is at worst linear. 

The above performance evaluation methodology can now be applied to any particular 
scenario we may be interested in. A few such scenarios are worked out in Sections 7 and 8. 

7 Exact optimality of the Shiryaev-Roberts-r procedure 

As we have pointed out earlier, the question of what solves Pollak' s version of the minimax 
optimization problem (18) has been open since its inception in 1985. Because of the third- 
order asymptotic optimality and the fact that it is an equalizer it was conjectured that the SRP 
procedure might be the sought optimum. In this subsection, we suggest two counterexamples 
that disprove this conjecture. These examples show that a) the SRP procedure is not optimal, 
and b) that the SR-r procedure is optimal. We stress that the SR-r procedure is optimal in 
these examples, but not in general. 

As a starting point, observe that equations (42), (43), (45) and (46) are special cases of 
the more general equation 



where v(x) is a given function, u{x) is the sought (unknown) function, and K{x, y), which 
is called the kernel of this equation, is of the form 




The PFA and ADD are then computed, respectively, as 



PFA-(rl) = (l-^){l-pXpW} and ADD-(rl) 



'K5a{x) + (1 - n)pipp{x) 
TT + (1 - Tr)pxp{x) 




(49) 




mthP^ix) being the cdf of the LRA„ = g(X„)//(X„). 

To see that (49) is an "umbrella" equation for all equations we are interested in, note that 
to obtain equation (42), which determines the ARL to false alarm, it suffices to take v{x) = 



24 



Polimcheiiko and Tartakovsky 



1 for all X and IC{x, y) = JCoo{x, y). Likewise, equation (43), which governs the ADD at 
V = 0, can be obtained from (49) by assuming v{x) = 1 for all x and lC{x, y) = ICo{x, y). 
By a similar argument, one can also verify that equations (45) and (46) are instances of (49). 
Thus, if one is able to solve equation (49), one is also able to solve any of the equations of 
interest. 

Suppose now that we have a change-point scenario for which the cdf {t) is such that 

p^[^)=;^ix)yiy) 

for some sufficiently smooth functions X{x) and y{y). In this case, the kernel IC{x, y) is 
separable, i.e., 

so that the variables x and y are separated. 

If the kernel is separable and the interval of integration has constant limits, the above 
equation can be solved analytically, and the solution is u{x) = v{x) + MX{x), where 

M= (^ j\{t)y'{t)dt^ I j\{t)y'{t)dt^ , 

which is a function of A only. 

More important is the fact that in this case 

P(i?I ^ y\S:^ > 1) = P(iil ^ y\R\ <A,Rl=r)= y{y)/y{A), 

i.e., ¥{R\ ^ y|iS^ > 1) does not depend on the starting point Rq = r. This means that the 
quasi-stationary distribution "kicks in" as early as the first observation becomes available. 
As a result, the SR-r procedure is an equalizer fox v \, and the only "degree of freedom" 
is, V = Q. If one now designs the starting point Rq so as to equate the performance of the 
SR-r procedure at i/ = to that sAv ^ 1, then the SR-r procedure will be an equalizer 
for all V Q. Therefore, by Corollary 1, in this case it is minimax. Note that this equalizer 
is different from the SRP rule, which is also an equalizer. The SR-r is an equalizer and 
minimax not in general but only in this particular case, i.e., in the case when the kernel is 
separable. Thus, we now have to find examples where this is true. This will prove that the 
SRP procedure is not strictly minimax in general. 

Suppose now that the observations' distribution is uniform(0, 1) pre-change and beta(2, 1) 
post-change, that is, f{x) = ll{o<x<i} ™d 17(2;) = 22; ]1{o<k<i}- The LR is A„ = 
2Xn ]l{o<x„<i}; observe that A„ £ (0, 2), since X„ € (0, 1). Hence, 



1, ift>2; 

t/2, if0sSt<2; and PQ{t) 
0, otherwise. 



1, ift>2; 
(^/2)^ if0=^t<2; 
0, otherwise. 



It is apparent that both these distributions are monomial and therefore separable. As a result, 
one can compute the required operating characteristics of any SR-type procedure analyt- 
ically. This was done by Tartakovsky and Polunchenko (2010). Another example, where 
f[x) = ll{2,^Qj and g{x) = 2e~^^l{^^Qj, was considered by Polunchenko and 
Tartakovsky (2010). Although this model may seem very different from the uniform(0, 1)- 
to-beta(2, 1) model, it has exactly the same distributions P^{t), d = {0, cxd}. Both papers 
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established the following theorem the proof of which can be found in Polunchenko and 
Tartakovsky (2010). 

Theorem 5 (Polunchenko and Tartakovsky 2010) Lef 7 = 1/(1 - 0.5 log 3) w 2.2. 

(i) If the starting point r in the SR-r procedure is chosen as va = \/l + A — 1 and the 
detection threshold A = A-^ is set to the solution of the transcendental equation 

A + (7 - \)VTTA\og{i + A)- 2(7 - i)VTTa = 0, 

then, for every 7 G (1,7), the ARL to false alarm Eoo [5^] is exactly 7 and the SR-r 
procedure is strictly minimax. That is, 

Jp{Sa) = inf Jp(T) for every 7 G (1,7)- 

TGZi(7) 

(ii) If the detection threshold in the SRP procedure is set to B = B-y = exp{2(l — 
1/7)} — 1, then the ARL to false alarm Ecx)[5^] is exactly 7 and J^p{Sg) is strictly 
greater than J^p{S\) for every 7 G (1,7). Hence, the SRP procedure is suboptimal. 




This theorem is illustrated in Figure 5. Note that the curves in the picture are exact. We 
stress again that while the SR-r procedure is exactly minimax in this example, it is still 
an open question what minimizes Pollak's Jp{T) in general. We conjecture that, in the 
general case, the optimal procedure is based on the deterministic initialization {r„}„^i that 
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depends on time, in which case a threshold A = An may also be a function of time. We also 
note that even if this fact is proved rigorously, finding the sequences {^71(7)} and {Anij)} 
(in every particular case) is an extremely difficult problem. Solving this problem may not be 
worth trying since the difference between the lower bound (30) and the supremum ADD is 
usually small, at least for a moderate (and of course low) false alarm rate. See Moustakides 
et al (201 1) and Figure 9 below. 

We conclude this subsection with a remark concerning exact optimality of the SR-r 
procedure in the class = {T: sup^.Poo(fe < T ^ k + m\T > fc) ^ a}, where 
a £ (0, 1) and m > 1. We first discussed this class in Subsection 3.3, where we mentioned 
that it is "stronger" than the class /\(7), i.e., in general ^^(7) contains A^. It can be easily 
verified that the Poo -distribution of the SR-r stopping time is zero-modified geometric: 



,(A; < 5^ ^ fc + mlS'A > fc) = 1 



■ log(l + A) 



for fc 5s 1; 



A 



[2{l + r) 



log(l + A) 



for fc = 0, 



where m ^ I. Thus, there is a one-to-one correspondence between the classes Z\™ and 
As a result, the SR-r procedure is minimax in the class Z\™ as well. The same is true 
for the exponential model considered by Polunchenko and Tartakovsky (2010). We believe 
that this is the first exact optimality result in the class A^. 



8 Case studies 

This section dissects two specific cases of the iid model (2) to illustrate the performance 
margin between the SR— r and SRP procedures and S^, defined in (28), (29) and (25), 
(26), respectively. 



8.1 Example 1: A beta-to-beta model 

Suppose the pre- and post-change densities are, respectively, 

fi^) = , — 7T- ^fo<x<U and g{x) = — ^'^ ^, ^!o<x<n, 

where S > is a given constant and B(-, ■) is the Beta function. That is, the observations Xn, 
n ^ I are iid beta((5, 5 + 1) -distributed pre-change and iid beta((5 + 1, (5)-distributed post- 
change. This beta((5, 5 + l)-to-beta((5 + 1, 5) model is of interest in the context of studying 
the accuracy of the asymptotic expansions for the performance of the two competing SR- 
type procedures. Specifically, recall that for sufficiently large detection thresholds, 

Eoo[>S^] w A/C-AiQ and ADD^(5^) w y(log A -h x - Coo) for all 5s 0, 

Eoo[5^] « A/C-r and ADDoo(5^) « y(logyl x - Coo), 

where / = Eo[log Ai] is the KuUback-Leibler information number, and x are defined 
in (32), fXQ is the mean of the quasi-stationary distribution, and the constant Coo is defined 
in (33). 
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For the beta(5, 5 + l)-to-beta(5 + 1, 5) model, Coo, C ^nd / are all computable 
analytically for any 5 > 0. This is of much aid in the context of testing the accuracy of 
the asymptotic approximations. Specifically, we first present the exact, explicit formulas for 
each of the needed quantities, assuming arbitrary 5 > 0. We then evaluate the performance 
of the procedures of interest using the methodology of Section 6 and compare the obtained 
performance against that predicted by the asymptotic approximations. 

Observe that A„ = X„/(l — X„) for any S > 0, whence one can readily deduce 
P^{t) = Prf(Ai ^ t), d = {0,oo}. Specifically, the densities p^{t) = dP^(t)/dt, 
d = {0, oo}, can be seen to be 

Ptit) = ^lyl^^ llo>o> and = '-^0^ 11(*>o>, (50) 

i.e., under either measure P^, d = {0,cxd}, the LR's distribution is Beta of type II (also 
known as the Beta prime distribution); the parameters are 6 and 5 + 1 under measure Poo, 
and S + 1 and S under measure Pq. The fact that p^{t) and pp (i) are both Beta prime 
with "mirrored" parameters suggests a certain symmetry embedded in the beta((5, 5 + 1)- 
to-beta((5 + 1, 5) model. Specifically, consider the "dual" beta((5 + 1, 5)-to-beta((5, S + 
1) model. That is, suppose the pre- and post-change distributions - f{x) and g{x) - are 
swapped so that the former is not beta((5, S + 1), but beta((5 -|- 1, S), and the latter is not 
beta(5 -|- 1,6), but beta(5, (5 + 1). A special case of this swapped model (with 5 = 1) 
was considered by Tartakovsky et al (2011). It can be shown, exploiting properties of the 
Beta and Beta prime distributions, that for the swapped beta((5 + 1, (5)-to-beta((5, 5-1-1) 
model, the densities Pd{t), = {0, oo}, are exactly the same as those we just derived for the 
original beta((5, 5 + l)-to-beta(5 + l, 5) model; see (50). Put otherwise, the beta(5, 5 + 1)- 
to-beta(5 + 1, 5) model and the beta(5 + 1, 5)-to-beta(5, 5+1) model are statistically 
indistinguishable, for any 5 > 0. This symmetry entails a few consequences to be demon- 
strated next. 

Consider the stationary distribution Qst{x) = limn_>oo Poo {Rn ^ x) of the SR statis- 
tic {Rn}n^o - The quasi-stationary pdf qsT{x) = dQsT{x)/dx is governed by the equation 

'?ST(a;) = / -^P^ i'rh'] 1ST{y)dy, 
Jq dx \i + yj 

which can be derived from equation (46) for the quasi-stationary pdf, qA{x), by letting 
A — > oo and noticing that lim^i^oo Aa = 1 and lim^^oo <1a{x) = qsTix) (cf. PoUak and 
Siegmund 1986). Using (50), we obtain 

x^~^ f°° (l + y)^'^^ 

^^^(") = B(5 + i,5) X (TT^T^)^^^^^^)''^' 

and the (exact) solution is 

X [l+X) S_l .-1-5 -n 

qsT(x) = B(5T) l{x>o}=5x (1 + a;) ^x>o}, 

which is the pdf of a Beta prime distribution with parameters 5 and 1. Note that qsT{x) ~ 
x~^ as a; — > oo, which agrees with Kesten (1973). 

Next, it can be shown that the pdf q{x) = dQ{x) j dx of distribution Q{x) = Pq (V^oo 
x) is governed by the equation 
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which can be established in a manner similar to that used to derive the above equation for 
qsTix). However, due to the symmetry of the model one can immediately conclude that 
q(x) = qsrix), so that 



q{x) = qsT{x) = Sx^ ^{l + x) ^ * 1 



{x>0} 



(51) 



We now can find 



Co 



(5>/^i(5)+tf^o(5)-tf'o(l), 



where ip'„(a;) = logr(x)/da;"+^ (n > 0) is the poly gamma function and r{x) is 
the Gamma function; also note that 'f'o(l) = —0.577 ... is the negative Euler's constant. 
To find and x, we use the formulas 



C = y exp |- i [F^iSk > 0) + MSk s; 0)] I , 



where x~ = — min(0,a::); cf., e.g., (Woodroofe 1982, Chapters 2 & 3) and (Siegmund 
1985, Chapter VIII). Using the work of Springer and Thompson (1970), after certain ma- 
nipulations we obtain that 



?oiSk ^ 0) 



r'=((5)rfc(r5 + 1) 



'k + l,k 
k + l,k + l 



I 



\ 




where G!^! (-j-) is theMeijer G-function. Note that due to the symmetry of thebeta((5, 5+\)- 
to-beta((5 + 1,5) model, F^{Sk > 0) = PoC^fc ^ 0) for all fe > 1. Hence, 



k = l 



0,5,. 



-5,1 
,5 



which can be evaluated numerically for any 5 > and with any desired accuracy. 
Next, it can be shown that Eg [Zl\ = 2!P'i {5) and 



MSI 



1 ^fc+2,fc 



-5,..., -5, 1, 1 
0,0,5, 



,5 



fe > 1, 



whence 



^=51'i{5) - 



k=0 



1 ^k + 2,k 



-5,..., -5,1,1 
0,0,5, 



.5 



Consider now starting the SR-r procedure off the point Rq = r* for which ADDq (iS^ ) 
and ADDoo('S^ ) are the same (at least approximately). This idea was first brought up 
in Subsection 5.2; recall Figure 4. By (35) and (37), when the ARL to false alarm is suffi- 
ciently large, 

ADDoo(5^)« i(log^ + x-Coo) and ADD o {S'a) ^ j[\og A + x - C{r)], (52) 
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where C(r) = E[log(l+r+T4o)] (see (38)). Hence, equating ADDo(5a) and ADD^{Sa) 
is equivalent to requiring Coo = C{r), and setting Rq to r that solves the equation Coo = 
C(r) results in the desired effect of ADDo(iS^) ~ ADDoo{iS^) (asymptotically). Since 
Coo is already computed, it is left to find C(r). To this end, using (51), we obtain 



where <!>{■, •, •) is the Lerch transcendent. Hence, the equation Coo = C{r), where r is the 
unknown, reduces to 



which can be solved numerically for any desired S > and with any pleased precision. 

We are now in a position to perform particular computations. To remind, we would like 
to test the accuracy of the asymptotic approximations (35) and (36). Clearly, the accuracy is 
the better, the higher the desired level of the ARL to false alarm Eoo [T] = 7. First, we intend 
to try a relatively small value of 7 = 10^, which corresponds to practically high chances of 
sounding a false alarm. We do not expect the asymptotics to kick in for 7 lower than a few 
hundreds. Suppose that 5=1. For this choice of 5 we have: Coo = 7r^/6 w 1.64, / = 1, 
C « 0.425, X « 1.25, and r* w 2 (so that C(r*) = Coo « 1-64). 

The first step is to set thresholds to guarantee the given ARL to false alarm 7. For the 
SR-r procedure, the detection threshold. A, should be set to the solution of the equation 
7 = A/^ — r, which follows from the corresponding asymptotics for Eoo [5^]. Since in 
our case r = r* w 2 and w 0.425, we find that A must be set to about 43. The actual 
(evaluated numerically with very high accuracy) ARL to false alarm with this A is 100.1. 
Hence, the approximation Eoo [5^] ~ A/(^ — r is very accurate, even when 7 = 10^, 
which is equivalent to a relatively high risk of raising a false alarm. For the SRP procedure 
to have EooiiS^] = 10^ the detection threshold. A, should be set to 43 as well; the actual 
ARL to false alarm for this choice of A is 99.6, and the mean, /ig, of the quasi-stationary 
distribution is around 2.6. Hence, the approximation Eoo [5^] ~ A/( — fiq is also very 
accurate. 

We now proceed to examining ADDi.(T) = E^ [T — i^|T > i^] as a function of v ^ 
for the two procedures in consideration. Figure 6 depicts how the sequence ADDi,(T), 
indexed by i/, evolves as 1/ runs from to 20 for the SR-r procedure (with Rq = r* ^ 2) 
and for the SRP procedure. It can be seen that ADDq {S]\) « ADDoo (Sa), as we planned. 
More importantly, note that the SR-r procedure is uniformly (i.e., for all u ^ 0) better than 
the SRP rule, while the difference is small. Starting the SR-r procedure from the point that 
equates the average detection delays at zero and at infinity is practically more convenient, 
as it does not require one to know the lower bound (not to mention the quasi-stationary 
distribution). As this example illustrates, it may also be sufficient to outperform the SRP 
procedure (though for this example the gain is practically negligible). 

We now turn to the accuracy of the asymptotic approximations for the average detection 
delays (52). According to these approximations for both procedures the worst ADD is about 
2.9 (note that both procedures have the same threshold). However, the actual ADD-s are 
3.54 for the SRP procedure and 3.52 for the SR-r procedure. Hence, the approximations 
are not too accurate, which is because the ARL to false alarm is only 100. 

Consider now setting 5 to 5. Since 1=1/6 this is a less contrast change than 5 = 
1. Consequently, the ADD-s should be higher, which can be used to better illustrate the 
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Fig. 6 Conditional average detection delay [S\ — v\S\ > i^] vs. change-point v for the SRP procedure 
and for the SR-r procedure with r = r* ^ 2 for the beta(<5, 5 + l)-to-beta((5 + 1, i5) model with 5 = 1. 
The ARL to false alarm Eoo [T] = 7 is approximately 100 for each procedure. 



accuracy of their respective approximations. For 5 = 5, we have I = 0.2, Coc ~ 3.19, 
C « 0.685, X « 0.435, and r* « 11. Let 7 = 5 x 10^. To have this level of the ARL to false 
alarm, the threshold for the SR-r procedure should be set to 3452 (the actual ARL to false 
alarm for this threshold is 4999.3), and for the SRP procedure - to 3462 (the actual ARL 
to false alarm for this threshold is 5000.1, and fiq w 26.1). Again, both approximations 
EooiiSA] ~ ^/C ~ ^nd Eoo[iS^] w yl/C — /iQ are highly accurate. We now look at the 
delays. Figure 7 shows the average delay to detection ADD^(T) versus the changepoint u 
for the SR-r procedure with i?5 = r* w 11 and for the SRP procedure. It can be seen 
that again ADDo(iS^) w ADDoo(iS^)- Furthermore, the SR-r procedure is almost an 
equalizer: there is a tiny mound raising above the SRP's flat line, though the mound is 
comparable in magnitude to the numerical error, and therefore, can be disregarded from a 
practical point of view. Both procedures are equally efficient, but since the SR-r procedure 
is easier to initialize it is preferable for practical purposes. 

In terms of the accuracy the actual ADDo(iS^) is 27, while that predicted by the approx- 
imation is 27. The actual value of ADDoo('Sa) is 27.1 versus the approximated value 27 
(which is the same as the value predicted for ADDo(5^), because C(r*) = Coo). Lastly, 
for the SRP procedure the actual average delay is 27. 1, while the predicted using the asymp- 
totic approximation value is 27. As we can see, the approximations for the ADD-s are now 
accurate. The reason is that the ARL to false alarm is relatively high. 
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Fig. 7 Conditional average detection delay Ei/ [<S^ — v\S\ > u\ vs. change-point v for the SRP procedure 
and for the SR-r procedure with r = r* 11 for the beta((5, 5 + l)-to-beta (<5 + 1, <5) model with <5 = 5. 
The ARL to false alarm Eoo [T] = 7 is approximately 5 X 10'' for each procedure. 



To draw a line under this example, the main conclusion is that the SR-r procedure 
is almost equalizer, and its performance is almost indistinguishable from that of the SRP 
procedure. However, it is easier to implement in practice, which is contrary to the SRP 
procedure. Hence, we recommend the SR-r procedure for practical purposes. 



8.2 Example 2: An exponential scenario 

Suppose the sequence {X„}„j,i is comprised by the exponentially distributed random vari- 
ables that undergo a shift in the mean from 1 to 1 + 6, where 6 > Q. Formally, the pre- and 
post-change densities in this case are 

/(x) = exp{-x} l{^^o} and ^(a;) = ^xp |-y^| l{x^o}, 

respectively. We refer to this model as the £(l)-to-£(l -|- 0) model. 

This model was considered by Tartakovsky et al (2009) for 6 = 0.1, which corresponds 
to a small, not easily detectable change. Using the numerical framework of Moustakides et al 
(201 1), also presented in Section 6, they earned out a performance analysis of CUSUM, the 
SRP procedure and the SR-r procedure comparing each against the other. They also com- 
puted the lower bound. We present an excerpt of results for the SRP and SR-r procedures 
along with the lower bound. The accuracy is within 0.5%. 
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Figure 8 shows operating characteristics in terms of Pollak's supremum conditional av- 
erage detection delay Jp{T) = sup^ E^[T — i/|T > i^] as a function of the ARL to false 
alarm Eoo [T\ = 7, plus the lower bound J7b(T). It can be seen that the best performance 
is delivered by the SR-r procedure. This is expected since by design the SR-r rule is the 
closest to the lower bound J'b{T). This suggests that the (unknown) optimal procedure can 
offer only a practically insignificant improvement over the SR-r procedure. 

Next, Figure 9 shows the behavior of the stationary average detection delay ^st{T) 
against the ARL to false alarm. Since the SR procedure is exactly optimal with respect 
to >Jst{T) its performance is the best among the three procedures, but the difference is 
relatively small. Note also that for the SRP procedure j7p(iS^) is the same as J7st('5^), 
since the SRP procedure is an equalizer. 
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Fig. 8 The lower bound Jb{T) and Pollak's Jp{T) for the SRP and SR-r procedures for the £'(l)-to- 
£(1 + d) model with Q = 0.1. The ARL to false alarm is between 5 X 10^ and 10"*. 
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Fig. 9 The stationary average detection delay i7st{2^) for the SRP and SR-r procedures for the £'(l)-to- 
E{\ + 6») model with Q = 0.1. The ARL to false alarm is between 5 X 10^ and 10"*. 
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